NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

DESCG: data encoding scheme classification with GNN in binary analysis

https://doi.org/10.1007/s10515-025-00538-0

Dai, Xushu; Luo, Nanqing; Wang, Haizhou; Wang, Zhilong; Cao, Chen; Liu, Peng (July 2025, Automated Software Engineering)

Abstract Binary analysis, the process of examining software without its source code, plays a crucial role in understanding program behavior, e.g., evaluating the security properties of commercial software, and analyzing malware. One challenging aspect of this process is to classify data encoding schemes, such as encryption and compression, due to the absence of high-level semantic information. Existing approaches either rely on code similarity, which only works for known schemes, or heuristic rules, which lack scalability. In this paper, we propose DESCG, a novel deep learning-based method for automatically classifying four widely employed kinds of data encoding schemes in binary programs: encryption, compression, decompression, and hashing. Our approach leverages dynamic analysis to extract execution traces from binary programs, builds data dependency graphs from these traces, and incorporates critical feature engineering. By combining the specialized graph representation with the Graph Neural Network (GNN), our approach enables accurate classification without requiring prior knowledge of specific encoding schemes. The Evaluation result shows that DESCG achieves 97.7% accuracy and an F1 score of 97.67%, outperforming baseline models. We also conducted an extensive evaluation of DESCG to explore which feature is more important for it and examine its performance and overhead.
more » « less
Free, publicly-accessible full text available July 18, 2026
Tackling imbalanced data in cybersecurity with transfer learning: a case with ROP payload detection

https://doi.org/10.1186/s42400-022-00135-8

Wang, Haizhou; Singhal, Anoop; Liu, Peng (January 2023, Cybersecurity)

Abstract In recent years, deep learning gained proliferating popularity in the cybersecurity application domain, since when being compared to traditional machine learning methods, it usually involves less human efforts, produces better results, and provides better generalizability. However, the imbalanced data issue is very common in cybersecurity, which can substantially deteriorate the performance of the deep learning models. This paper introduces a transfer learning based method to tackle the imbalanced data issue in cybersecurity using return-oriented programming payload detection as a case study. We achieved 0.0290 average false positive rate, 0.9705 average F1 score and 0.9521 average detection rate on 3 different target domain programs using 2 different source domain programs, with 0 benign training data sample in the target domain. The performance improvement compared to the baseline is a trade-off between false positive rate and detection rate. Using our approach, the total number of false positives is reduced by 23.16%, and as a trade-off, the number of detected malicious samples decreases by 0.68%.
more » « less
Using deep learning to solve computer security challenges: a survey

https://doi.org/10.1186/s42400-020-00055-5

Choi, Yoon-Ho; Liu, Peng; Shang, Zitong; Wang, Haizhou; Wang, Zhilong; Zhang, Lan; Zhou, Junwei; Zou, Qingtian (December 2020, Cybersecurity)
null (Ed.)
Abstract Although using machine learning techniques to solve computer security challenges is not a new idea, the rapidly emerging Deep Learning technology has recently triggered a substantial amount of interests in the computer security community. This paper seeks to provide a dedicated review of the very recent research works on using Deep Learning techniques to solve computer security challenges. In particular, the review covers eight computer security problems being solved by applications of Deep Learning: security-oriented program analysis, defending return-oriented programming (ROP) attacks, achieving control-flow integrity (CFI), defending network attacks, malware classification, system-event-based anomaly detection, memory forensics, and fuzzing for software security.
more » « less
Full Text Available

Search for: All records